Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization

نویسندگان

  • Shoaib Kamil
  • Derrick Coetzee
  • Armando Fox
چکیده

Today’s productivity programmers, such as scientists who need to write code to do science, are typically forced to choose between productive and maintainable code with modest performance (e.g. Python plus native libraries such as SciPy [SciPy]) or complex, brittle, hardware-specific code that entangles application logic with performance concerns but runs two to three orders of magnitude faster (e.g. C++ with OpenMP, CUDA, etc.). The dynamic features of modern productivity languages like Python enable an alternative approach that bridges the gap between productivity and performance. SEJITS (Selective, Embedded, Just-in-Time Specialization) embeds domain-specific languages (DSLs) in high-level languages like Python for popular computational kernels such as stencils, matrix algebra, and others. At runtime, the DSLs are “compiled” by combining expert-provided source code templates specific to each problem type, plus a strategy for optimizing an abstract syntax tree representing a domain-specific but language-independent representation of the problem instance. The result is efficiency-level (e.g. C, C++) code callable from Python whose performance equals or exceeds that of handcrafted code, plus performance portability by allowing multiple code generation strategies within the same specializer to target different hardware present at runtime, e.g. multicore CPUs vs. GPUs. Application writers never leave the Python world, and we do not assume any modification or support for parallelism in Python itself. We present Asp (“Asp is SEJITS for Python”) and initial results from several domains. We demonstrate that domain-specific specializers allow highlyproductive Python code to obtain performance meeting or exceeding expertcrafted low-level code on parallel hardware, without sacrificing maintainability or

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Parallel Graph Algorithms in Python

Domain experts in a variety of fields utilize large-scale graph analysis; however, creating high-performance parallel graph applications currently involves expertise in both graph theory and parallel programming which might not be available to the domain specialist. This project explores methods for bringing efficient parallel performance to graph applications written in Python using selective ...

متن کامل

SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization

Today’s “high productivity” programming languages such as Python lack the performance of harder-toprogram “efficiency” languages (CUDA, Cilk, C with OpenMP) that can exploit extensive programmer knowledge of parallel hardware architectures. We combine efficiency-language performance with productivitylanguage programmability using selective embedded just-in-time specialization (SEJITS). At runti...

متن کامل

Enabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce

Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via justin-time compilation. We extend SEJITS to exploit intermachine parallelism by targeting clusters of machines via...

متن کامل

CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications

Typically, scientists with computational needs prefer to use high-level languages such as Python or MATLAB; however, large computationally-intensive problems must eventually be recoded in a low level language such as C or Fortran by expert programmers in order to achieve sufficient performance. In addition, multiple strategies may exist for mapping a problem onto parallel hardware; unless the h...

متن کامل

Parallel processing of filtered queries in attributed semantic graphs

Execution of complex analytic queries on massive semantic graphs is a challenging problem in big-data analytics that requires high-performance parallel computing. In a semantic graph, vertices and edges carry attributes of various types and the analytic queries typically depend on the values of these attributes. Thus, the computation must view the graph through a filter that passes only those i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011